Refactor binary HNSW stats to use OpenMP reduction instead of destructor sync (#4910)#4910
Closed
sharm235 wants to merge 1 commit into
Closed
Refactor binary HNSW stats to use OpenMP reduction instead of destructor sync (#4910)#4910sharm235 wants to merge 1 commit into
sharm235 wants to merge 1 commit into
Conversation
Contributor
sharm235
pushed a commit
to sharm235/faiss
that referenced
this pull request
Mar 13, 2026
…tor sync (facebookresearch#4910) Summary: ## Problem Follow-up to the previous fix (replacing `#pragma omp critical` with `#pragma omp atomic` in `FlatHammingDis::~FlatHammingDis`). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (`IndexHNSW.cpp`). The previous approach had two issues: 1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm 2. `FlatHammingDis` maintained a redundant `ndis` counter (incremented on every `operator()` call) that duplicated what `HNSW::search()` already tracks internally via its returned `HNSWStats` ## Changes ### `IndexBinaryHNSW::search()` — use `#pragma omp for reduction` - Capture the `HNSWStats` returned by `hnsw.search()` per query - Accumulate `n1, n2, ndis, nhops` via `#pragma omp for reduction(+: ...)`, which uses lock-free per-thread copies merged at barrier — zero contention - Call `hnsw_stats.combine()` once, single-threaded, after the parallel region - This exactly matches the pattern in `hnsw_search()` in `IndexHNSW.cpp` (lines 259-291) ### `FlatHammingDis` — remove stats accumulation entirely - Remove `size_t ndis` field (redundant with `HNSW::search()` internal tracking) - Remove `ndis++` from `operator()` (eliminates per-distance-computation overhead) - Remove destructor (no stats to accumulate → no sync needed) ### `IndexBinaryHNSWCagra::search()` — fix missing stats aggregation - The `base_level_only` path's second parallel block (`search_level_0`) accumulated stats into a per-thread `HNSWStats search_stats` but never aggregated it into `hnsw_stats` — this was a pre-existing bug where stats were silently lost - Added `#pragma omp critical { hnsw_stats.combine(search_stats); }` once per thread at the end of the parallel block, matching `IndexHNSW::search_level_0_wrapper()` (line 461) ## Performance Impact - **Zero lock contention** in the main `IndexBinaryHNSW::search()` path (reduction is lock-free) - Removes per-distance-computation `ndis++` overhead from `FlatHammingDis::operator()` - `IndexBinaryHNSWCagra` path: `#pragma omp critical` once per thread (acceptable, matches float HNSW) - No change to search results — only stats accumulation mechanism changes Reviewed By: mnorris11 Differential Revision: D95911440
2fdaa78 to
4f2d46d
Compare
sharm235
pushed a commit
to sharm235/faiss
that referenced
this pull request
Mar 13, 2026
…tor sync (facebookresearch#4910) Summary: ## Problem Follow-up to the previous fix (replacing `#pragma omp critical` with `#pragma omp atomic` in `FlatHammingDis::~FlatHammingDis`). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (`IndexHNSW.cpp`). The previous approach had two issues: 1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm 2. `FlatHammingDis` maintained a redundant `ndis` counter (incremented on every `operator()` call) that duplicated what `HNSW::search()` already tracks internally via its returned `HNSWStats` ## Changes ### `IndexBinaryHNSW::search()` — use `#pragma omp for reduction` - Capture the `HNSWStats` returned by `hnsw.search()` per query - Accumulate `n1, n2, ndis, nhops` via `#pragma omp for reduction(+: ...)`, which uses lock-free per-thread copies merged at barrier — zero contention - Call `hnsw_stats.combine()` once, single-threaded, after the parallel region - This exactly matches the pattern in `hnsw_search()` in `IndexHNSW.cpp` (lines 259-291) ### `FlatHammingDis` — remove stats accumulation entirely - Remove `size_t ndis` field (redundant with `HNSW::search()` internal tracking) - Remove `ndis++` from `operator()` (eliminates per-distance-computation overhead) - Remove destructor (no stats to accumulate → no sync needed) ### `IndexBinaryHNSWCagra::search()` — fix missing stats aggregation - The `base_level_only` path's second parallel block (`search_level_0`) accumulated stats into a per-thread `HNSWStats search_stats` but never aggregated it into `hnsw_stats` — this was a pre-existing bug where stats were silently lost - Added `#pragma omp critical { hnsw_stats.combine(search_stats); }` once per thread at the end of the parallel block, matching `IndexHNSW::search_level_0_wrapper()` (line 461) ## Performance Impact - **Zero lock contention** in the main `IndexBinaryHNSW::search()` path (reduction is lock-free) - Removes per-distance-computation `ndis++` overhead from `FlatHammingDis::operator()` - `IndexBinaryHNSWCagra` path: `#pragma omp critical` once per thread (acceptable, matches float HNSW) - No change to search results — only stats accumulation mechanism changes Reviewed By: mnorris11 Differential Revision: D95911440
4f2d46d to
67d5ff7
Compare
sharm235
pushed a commit
to sharm235/faiss
that referenced
this pull request
Mar 13, 2026
…tor sync (facebookresearch#4910) Summary: ## Problem Follow-up to the previous fix (replacing `#pragma omp critical` with `#pragma omp atomic` in `FlatHammingDis::~FlatHammingDis`). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (`IndexHNSW.cpp`). The previous approach had two issues: 1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm 2. `FlatHammingDis` maintained a redundant `ndis` counter (incremented on every `operator()` call) that duplicated what `HNSW::search()` already tracks internally via its returned `HNSWStats` ## Changes ### `IndexBinaryHNSW::search()` — use `#pragma omp for reduction` - Capture the `HNSWStats` returned by `hnsw.search()` per query - Accumulate `n1, n2, ndis, nhops` via `#pragma omp for reduction(+: ...)`, which uses lock-free per-thread copies merged at barrier — zero contention - Call `hnsw_stats.combine()` once, single-threaded, after the parallel region - This exactly matches the pattern in `hnsw_search()` in `IndexHNSW.cpp` (lines 259-291) ### `FlatHammingDis` — remove stats accumulation entirely - Remove `size_t ndis` field (redundant with `HNSW::search()` internal tracking) - Remove `ndis++` from `operator()` (eliminates per-distance-computation overhead) - Remove destructor (no stats to accumulate → no sync needed) ### `IndexBinaryHNSWCagra::search()` — fix missing stats aggregation - The `base_level_only` path's second parallel block (`search_level_0`) accumulated stats into a per-thread `HNSWStats search_stats` but never aggregated it into `hnsw_stats` — this was a pre-existing bug where stats were silently lost - Added `#pragma omp critical { hnsw_stats.combine(search_stats); }` once per thread at the end of the parallel block, matching `IndexHNSW::search_level_0_wrapper()` (line 461) ## Performance Impact - **Zero lock contention** in the main `IndexBinaryHNSW::search()` path (reduction is lock-free) - Removes per-distance-computation `ndis++` overhead from `FlatHammingDis::operator()` - `IndexBinaryHNSWCagra` path: `#pragma omp critical` once per thread (acceptable, matches float HNSW) - No change to search results — only stats accumulation mechanism changes Reviewed By: mnorris11 Differential Revision: D95911440
67d5ff7 to
0605edb
Compare
sharm235
pushed a commit
to sharm235/faiss
that referenced
this pull request
Mar 13, 2026
…tor sync (facebookresearch#4910) Summary: Pull Request resolved: facebookresearch#4910 ## Problem Follow-up to the previous fix (replacing `#pragma omp critical` with `#pragma omp atomic` in `FlatHammingDis::~FlatHammingDis`). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (`IndexHNSW.cpp`). The previous approach had two issues: 1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm 2. `FlatHammingDis` maintained a redundant `ndis` counter (incremented on every `operator()` call) that duplicated what `HNSW::search()` already tracks internally via its returned `HNSWStats` ## Changes ### `IndexBinaryHNSW::search()` — use `#pragma omp for reduction` - Capture the `HNSWStats` returned by `hnsw.search()` per query - Accumulate `n1, n2, ndis, nhops` via `#pragma omp for reduction(+: ...)`, which uses lock-free per-thread copies merged at barrier — zero contention - Call `hnsw_stats.combine()` once, single-threaded, after the parallel region - This exactly matches the pattern in `hnsw_search()` in `IndexHNSW.cpp` (lines 259-291) ### `FlatHammingDis` — remove stats accumulation entirely - Remove `size_t ndis` field (redundant with `HNSW::search()` internal tracking) - Remove `ndis++` from `operator()` (eliminates per-distance-computation overhead) - Remove destructor (no stats to accumulate → no sync needed) ### `IndexBinaryHNSWCagra::search()` — fix missing stats aggregation - The `base_level_only` path's second parallel block (`search_level_0`) accumulated stats into a per-thread `HNSWStats search_stats` but never aggregated it into `hnsw_stats` — this was a pre-existing bug where stats were silently lost - Added `#pragma omp critical { hnsw_stats.combine(search_stats); }` once per thread at the end of the parallel block, matching `IndexHNSW::search_level_0_wrapper()` (line 461) ## Performance Impact - **Zero lock contention** in the main `IndexBinaryHNSW::search()` path (reduction is lock-free) - Removes per-distance-computation `ndis++` overhead from `FlatHammingDis::operator()` - `IndexBinaryHNSWCagra` path: `#pragma omp critical` once per thread (acceptable, matches float HNSW) - No change to search results — only stats accumulation mechanism changes Reviewed By: mnorris11 Differential Revision: D95911440
9a89b7d to
e3ddfed
Compare
sharm235
pushed a commit
to sharm235/faiss
that referenced
this pull request
Mar 13, 2026
…tor sync (facebookresearch#4910) Summary: ## Problem Follow-up to the previous fix (replacing `#pragma omp critical` with `#pragma omp atomic` in `FlatHammingDis::~FlatHammingDis`). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (`IndexHNSW.cpp`). The previous approach had two issues: 1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm 2. `FlatHammingDis` maintained a redundant `ndis` counter (incremented on every `operator()` call) that duplicated what `HNSW::search()` already tracks internally via its returned `HNSWStats` ## Changes ### `IndexBinaryHNSW::search()` — use `#pragma omp for reduction` - Capture the `HNSWStats` returned by `hnsw.search()` per query - Accumulate `n1, n2, ndis, nhops` via `#pragma omp for reduction(+: ...)`, which uses lock-free per-thread copies merged at barrier — zero contention - Call `hnsw_stats.combine()` once, single-threaded, after the parallel region - This exactly matches the pattern in `hnsw_search()` in `IndexHNSW.cpp` (lines 259-291) ### `FlatHammingDis` — remove stats accumulation entirely - Remove `size_t ndis` field (redundant with `HNSW::search()` internal tracking) - Remove `ndis++` from `operator()` (eliminates per-distance-computation overhead) - Remove destructor (no stats to accumulate → no sync needed) ### `IndexBinaryHNSWCagra::search()` — fix missing stats aggregation - The `base_level_only` path's second parallel block (`search_level_0`) accumulated stats into a per-thread `HNSWStats search_stats` but never aggregated it into `hnsw_stats` — this was a pre-existing bug where stats were silently lost - Added `#pragma omp critical { hnsw_stats.combine(search_stats); }` once per thread at the end of the parallel block, matching `IndexHNSW::search_level_0_wrapper()` (line 461) ## Performance Impact - **Zero lock contention** in the main `IndexBinaryHNSW::search()` path (reduction is lock-free) - Removes per-distance-computation `ndis++` overhead from `FlatHammingDis::operator()` - `IndexBinaryHNSWCagra` path: `#pragma omp critical` once per thread (acceptable, matches float HNSW) - No change to search results — only stats accumulation mechanism changes Reviewed By: mnorris11 Differential Revision: D95911440
sharm235
pushed a commit
to sharm235/faiss
that referenced
this pull request
Mar 13, 2026
…tor sync (facebookresearch#4910) Summary: ## Problem Follow-up to the previous fix (replacing `#pragma omp critical` with `#pragma omp atomic` in `FlatHammingDis::~FlatHammingDis`). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (`IndexHNSW.cpp`). The previous approach had two issues: 1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm 2. `FlatHammingDis` maintained a redundant `ndis` counter (incremented on every `operator()` call) that duplicated what `HNSW::search()` already tracks internally via its returned `HNSWStats` ## Changes ### `IndexBinaryHNSW::search()` — use `#pragma omp for reduction` - Capture the `HNSWStats` returned by `hnsw.search()` per query - Accumulate `n1, n2, ndis, nhops` via `#pragma omp for reduction(+: ...)`, which uses lock-free per-thread copies merged at barrier — zero contention - Call `hnsw_stats.combine()` once, single-threaded, after the parallel region - This exactly matches the pattern in `hnsw_search()` in `IndexHNSW.cpp` (lines 259-291) ### `FlatHammingDis` — remove stats accumulation entirely - Remove `size_t ndis` field (redundant with `HNSW::search()` internal tracking) - Remove `ndis++` from `operator()` (eliminates per-distance-computation overhead) - Remove destructor (no stats to accumulate → no sync needed) ### `IndexBinaryHNSWCagra::search()` — fix missing stats aggregation - The `base_level_only` path's second parallel block (`search_level_0`) accumulated stats into a per-thread `HNSWStats search_stats` but never aggregated it into `hnsw_stats` — this was a pre-existing bug where stats were silently lost - Added `#pragma omp critical { hnsw_stats.combine(search_stats); }` once per thread at the end of the parallel block, matching `IndexHNSW::search_level_0_wrapper()` (line 461) ## Performance Impact - **Zero lock contention** in the main `IndexBinaryHNSW::search()` path (reduction is lock-free) - Removes per-distance-computation `ndis++` overhead from `FlatHammingDis::operator()` - `IndexBinaryHNSWCagra` path: `#pragma omp critical` once per thread (acceptable, matches float HNSW) - No change to search results — only stats accumulation mechanism changes Reviewed By: mnorris11 Differential Revision: D95911440
e3ddfed to
0c5da12
Compare
…tor sync (facebookresearch#4910) Summary: Pull Request resolved: facebookresearch#4910 ## Problem Follow-up to the previous fix (replacing `#pragma omp critical` with `#pragma omp atomic` in `FlatHammingDis::~FlatHammingDis`). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (`IndexHNSW.cpp`). The previous approach had two issues: 1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm 2. `FlatHammingDis` maintained a redundant `ndis` counter (incremented on every `operator()` call) that duplicated what `HNSW::search()` already tracks internally via its returned `HNSWStats` ## Changes ### `IndexBinaryHNSW::search()` — use `#pragma omp for reduction` - Capture the `HNSWStats` returned by `hnsw.search()` per query - Accumulate `n1, n2, ndis, nhops` via `#pragma omp for reduction(+: ...)`, which uses lock-free per-thread copies merged at barrier — zero contention - Call `hnsw_stats.combine()` once, single-threaded, after the parallel region - This exactly matches the pattern in `hnsw_search()` in `IndexHNSW.cpp` (lines 259-291) ### `FlatHammingDis` — remove stats accumulation entirely - Remove `size_t ndis` field (redundant with `HNSW::search()` internal tracking) - Remove `ndis++` from `operator()` (eliminates per-distance-computation overhead) - Remove destructor (no stats to accumulate → no sync needed) ### `IndexBinaryHNSWCagra::search()` — fix missing stats aggregation - The `base_level_only` path's second parallel block (`search_level_0`) accumulated stats into a per-thread `HNSWStats search_stats` but never aggregated it into `hnsw_stats` — this was a pre-existing bug where stats were silently lost - Added `#pragma omp critical { hnsw_stats.combine(search_stats); }` once per thread at the end of the parallel block, matching `IndexHNSW::search_level_0_wrapper()` (line 461) ## Performance Impact - **Zero lock contention** in the main `IndexBinaryHNSW::search()` path (reduction is lock-free) - Removes per-distance-computation `ndis++` overhead from `FlatHammingDis::operator()` - `IndexBinaryHNSWCagra` path: `#pragma omp critical` once per thread (acceptable, matches float HNSW) - No change to search results — only stats accumulation mechanism changes Reviewed By: mnorris11 Differential Revision: D95911440
0c5da12 to
faad9ef
Compare
Contributor
|
This pull request has been merged in 5dcef18. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Problem
Follow-up to the previous fix (replacing
#pragma omp criticalwith#pragma omp atomicinFlatHammingDis::~FlatHammingDis). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (IndexHNSW.cpp).The previous approach had two issues:
FlatHammingDismaintained a redundantndiscounter (incremented on everyoperator()call) that duplicated whatHNSW::search()already tracks internally via its returnedHNSWStatsChanges
IndexBinaryHNSW::search()— use#pragma omp for reductionHNSWStatsreturned byhnsw.search()per queryn1, n2, ndis, nhopsvia#pragma omp for reduction(+: ...), which uses lock-free per-thread copies merged at barrier — zero contentionhnsw_stats.combine()once, single-threaded, after the parallel regionhnsw_search()inIndexHNSW.cpp(lines 259-291)FlatHammingDis— remove stats accumulation entirelysize_t ndisfield (redundant withHNSW::search()internal tracking)ndis++fromoperator()(eliminates per-distance-computation overhead)IndexBinaryHNSWCagra::search()— fix missing stats aggregationbase_level_onlypath's second parallel block (search_level_0) accumulated stats into a per-threadHNSWStats search_statsbut never aggregated it intohnsw_stats— this was a pre-existing bug where stats were silently lost#pragma omp critical { hnsw_stats.combine(search_stats); }once per thread at the end of the parallel block, matchingIndexHNSW::search_level_0_wrapper()(line 461)Performance Impact
IndexBinaryHNSW::search()path (reduction is lock-free)ndis++overhead fromFlatHammingDis::operator()IndexBinaryHNSWCagrapath:#pragma omp criticalonce per thread (acceptable, matches float HNSW)Reviewed By: mnorris11
Differential Revision: D95911440